Supplementary Notebook S7: Highly Summarized Plots for Assembling the Figure 2


Requirements:

The following preprocessing and method runs must be completed (in order) before executing this notebook:

  1. 01-SimulatedDatasets.ipynb - Prepares simulated datasets with various perturbation scenarios.
  2. 02-runCOPF.R - Executes the COPF method on the simulated datasets.
  3. 03-runPeCorA.R - Executes the PeCorA method on the simulated datasets.
  4. 04-runProteoForge.R - Executes the ProteoForge method on the simulated datasets.
  5. 05-IdentificationBenchmark.ipynb - Generates the peptide identification benchmark figures, and combines results from all methods.
  6. 06-GroupingBenchmark.ipynb - Generates the peptide grouping benchmark figures, and combines results from all methods.

Data Information:

This notebooks uses the final combined benchmarking results table from the real-data benchmarking (../Benchmark/) and simulated benchmarking (./) analyses. From 1 real-data, and 4 simulated setups, 2 benchmarks (peptide identification and peptide grouping) were used.


Purpose:

This notebook compiles key performance plots from previous analyses into a cohesive figure for publication. It focuses on summarizing the ROC curves and optimal F1/MCC thresholds across different perturbation scenarios and methods (COPF, PeCorA, ProteoForge). The final figure highlights comparative strengths and weaknesses of each approach in detecting proteoform changes under various conditions. This figure is meant to be very high-level and concise to go into the main figures, whereas the detailed plots and analyses are available in the supplementary notebooks as well as placed into supplementary figures.


Setup

This section imports required libraries, configures display settings, and defines paths for data and figures.

Note: The HTML rendering of this notebook hides code cells by default. Click the "Code" buttons on the right to expand them.

Libraries

import os
import sys
import warnings

import numpy as np # Numerical computing
import pandas as pd # Data manipulatio

import seaborn as sns # R-like high-level plots
import matplotlib.pyplot as plt # Python's base plotting 

sys.path.append('../')
from src import utils, plots

warnings.filterwarnings('ignore')

# Initialize the timer
startTime = utils.getTime()

Display Settings

The cell below configures pandas, matplotlib, and seaborn display options for improved readability of tables and figures, including color palettes and figure export settings.

# Define default colors and styles for plots
def_colors = [
    "#139593", "#fca311", "#e54f2a",
    "#c3c3c3", "#555555",
    "#690000", "#5f4a00", "#004549"
]

# Set seaborn style
sns.set_theme(
    style="white",
    context="paper",
    palette=def_colors,
    font_scale=1,
    rc={
        "figure.figsize": (6, 4),
        "font.family": "sans-serif",
        "font.sans-serif": ["Arial", "Ubuntu Mono"],
    }
)

# Figure Saving Settings
figure_formats = ["pdf"]
save_to_folder = True
transparent_bg = True
figure_dpi = 300

## Configure dataframe displaying
pd.set_option('display.float_format', lambda x: '%.4f' % x)
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 500)  # Set a wider display width

## Printing Settings
verbose = True

Data and Result Paths

Data and figures are organized in separate folders:

  • data_path — Not one directory is needed as data is read from both the real-data benchmarking (../Benchmark/data/prepared/) and simulated benchmarking (./data/prepared/) analyses. They are explicitly added to each file-reading step.
  • figure_path — Directory for generated plots and figures (figures/07-FigureAssembly/).
notebook_name = "07-FigureAssembly"
figure_path = f"./figures/{notebook_name}/"

# Create figure folder structure, if needed
if save_to_folder:
    for i in figure_formats:
        cur_folder = figure_path + i + "/"
        if not os.path.exists(cur_folder):
            os.makedirs(cur_folder)

Global Variables

Defines constants and visual styling for consistent analysis and publication-quality figures:

  • seed — Random seed for reproducibility
  • pthr — P-value threshold (10⁻³) for statistical significance
  • method_palette / method_markers — Color and marker scheme for methods (COPF, PeCorA, ProteoForge)
  • mcc_thresholds / mcc_colors — MCC interpretation scale with semantic labels (Random → Almost Perfect)
# Global variables
seed = 42 # Seed for reproducibility  
pthr = 10**-3  # p-value threshold for significance
thresholds = list(utils.generate_thresholds(10.0, -15, 1, 0, 1, 0.1)) # Thresholds for the analysis

# Methods and their plotting styles
method_palette = {
    "COPF": "#139593",
    "PeCorA": "#fca311",
    "ProteoForge": "#e54f2a",
}
method_styles = {
    "COPF": "--",
    "PeCorA": "-.",
    "ProteoForge": ":",
}
method_markers = {
    "COPF": "o",
    "PeCorA": "s",
    "ProteoForge": "^",
}


# Matthews Correlation Coefficient (MCC) thresholds and colors
mcc_thresholds = {
    0.0 : 'Random',
    0.3 : 'Fair (0.3)',
    0.5 : 'Moderate (0.5)',
    0.7 : 'Strong (0.7)',
    0.9 : 'Almost Perfect (0.9)',
}
mcc_colors = {
    'Random': '#c3c3c3',             
    'Fair (0.3)': '#e0aaff',         
    'Moderate (0.5)': '#9d4edd',     
    'Strong (0.7)': '#5a189a',       
    'Almost Perfect (0.9)': '#240046' 
}

plots.color_palette(mcc_colors, save=False, name="MCC Interpretation Colors")
plots.color_palette(method_palette, save=False, name="Method Colors")

Figure 2: Method Benchmarking Overview

This figure compiles performance comparisons of ProteoForge against established methods (COPF, PeCorA) using both real-world SWATH-MS data and controlled simulations. The benchmarks evaluate two core tasks: Peptide Identification (detecting which peptides are perturbed) and Peptide Grouping (correctly assigning perturbed peptides to proteoform groups).

2.1 SWATH-MS Benchmark: Real-World Performance

Data Source: COPF manuscript benchmark dataset (analyzed in ../Benchmark/ notebooks)

Key Question: How do methods compare on experimentally validated proteoform perturbations?

swath_pepid = pd.read_feather("../Benchmark/data/results/peptide_identification_performance_data.feather")
swath_pepgrp = pd.read_feather("../Benchmark/data/results/peptide_grouping_performance_data.feather")
# Combine as swath with Benchmark column indicating pepID, or pepGrp
swath_pepgrp['Benchmark'] = 'Peptide Grouping'
swath_pepid['Benchmark'] = 'Peptide Identification'
swath_combined = pd.concat(
    [swath_pepid, swath_pepgrp], ignore_index=True
).rename(columns={
    'perturbation': 'Experiment',
    'method': 'Method',
})
# swath_combined['Experiment'].value_counts()
# Replace the Experiment to standardize the naming:
swath_combined['Experiment'] = swath_combined['Experiment'].replace({
    '1 Peptide': '1 Peptide',
    '2 Peptides': '2 Peptides',
    '%50 Peptides': '50% Peptides',
    'Random (1 to %50) Peptides': 'Random Peptides',
    'Random (2 to %50) Peptides': 'Random Peptides',
})
swath_combined['Experiment'].value_counts()
Experiment
2 Peptides         125
50% Peptides       125
Random Peptides    125
1 Peptide           75
Name: count, dtype: int64

MCC Performance Across Perturbation Scenarios

The barplot compares Matthews Correlation Coefficient (MCC) across four perturbation scenarios:

  • 1 Peptide / 2 Peptides: Single or paired peptide perturbations
  • Random Peptides: Variable number (2–50%) of peptides perturbed
  • 50% Peptides: Half of all peptides perturbed

Visual Elements:

  • Bars show mean MCC (± 95% CI) across thresholds
  • Scatter points indicate MCC at p-value threshold (10⁻³)
  • Horizontal reference lines denote MCC interpretation thresholds
from matplotlib.lines import Line2D
from matplotlib.gridspec import GridSpec

def compute_offset(n_methods, idx_method, total_width=0.8):
    # Calculates the horizontal offset for aligning elements with grouped bars.
    if n_methods <= 1:
        return 0.0
    single_width = total_width / n_methods
    center_offset = (n_methods - 1) / 2.0
    return (idx_method - center_offset) * single_width

def plot_benchmark_panel(ax, data, x_order, hue_order, palette, markers, p_threshold):
    # Generates a single panel with a bar plot, score annotations, and scatter points.
    sns.barplot(
        ax=ax, data=data, x='Experiment', y='MCC', hue='Method',
        order=x_order, hue_order=hue_order, palette=palette,
        edgecolor="black", linewidth=1.0, errorbar="ci",
        capsize=0.03, err_kws={'linewidth': 1.0}
    )

    n_methods = len(hue_order)
    pthr_data = data[data['threshold'] == p_threshold]

    for i, experiment in enumerate(x_order):
        for j, method in enumerate(hue_order):
            offset = compute_offset(n_methods, j)
            x_pos = i + offset

            mean_mcc = data[(data["Experiment"] == experiment) & (data["Method"] == method)]["MCC"].mean()
            if not np.isnan(mean_mcc):
                ax.text(
                    x_pos, -0.03, f"{mean_mcc:.2f}", color=palette.get(method, 'k'),
                    ha="center", va="top", fontsize=8, fontweight="bold", rotation=90,
                )

            method_pthr_data = pthr_data[(pthr_data['Method'] == method) & (pthr_data['Experiment'] == experiment)]
            if not method_pthr_data.empty:
                y_pos = method_pthr_data['MCC'].iloc[0]
                ax.scatter(
                    x_pos, y_pos, color=palette.get(method, 'k'), s=60,
                    edgecolor='black', linewidth=1.0, marker=markers.get(method, 'o'),
                    zorder=10, alpha=0.95
                )
    
    ax.grid(axis="y", linestyle="--", linewidth=0.5, alpha=0.7, color="gray")
    ax.set_xlabel('')
    ax.tick_params(axis='x', labelsize=9, rotation=90) # Rotate x-axis labels
    ax.tick_params(axis='y', labelsize=9)
    if ax.get_legend() is not None:
        ax.get_legend().remove()

# Filter and categorize data for plotting
experiments_ordered = ['1 Peptide', '2 Peptides', 'Random Peptides', '50% Peptides']
methods_ordered_pepid = ['ProteoForge', 'PeCorA', 'COPF']
methods_ordered_pepgrp = ['ProteoForge', 'COPF']

subset_pepid = swath_combined[(swath_combined['Benchmark'] == 'Peptide Identification') & (swath_combined['Experiment'].isin(experiments_ordered))].copy()
subset_pepid['Experiment'] = pd.Categorical(subset_pepid['Experiment'], categories=experiments_ordered, ordered=True)
subset_pepid['Method'] = pd.Categorical(subset_pepid['Method'], categories=methods_ordered_pepid, ordered=True)

subset_pepgrp = swath_combined[(swath_combined['Benchmark'] == 'Peptide Grouping') & (swath_combined['Experiment'].isin(experiments_ordered))].copy()
subset_pepgrp['Experiment'] = pd.Categorical(subset_pepgrp['Experiment'], categories=experiments_ordered, ordered=True)
subset_pepgrp['Method'] = pd.Categorical(subset_pepgrp['Method'], categories=methods_ordered_pepgrp, ordered=True)
    
# --- Figure Creation ---
fig = plt.figure(figsize=(12, 5))
# Adjust width ratios: ax1 wider (3 groups), ax2 narrower (2 groups)
gs = GridSpec(2, 20, height_ratios=[1, 0.15], hspace=0.4, wspace=0.025)
ax1 = fig.add_subplot(gs[0, :12])  # ax1: wider (12/20)
ax2 = fig.add_subplot(gs[0, 12:18], sharey=ax1)  # ax2: narrower (6/20)
ax_legend = fig.add_subplot(gs[1, :])
ax_legend.axis('off')

# Plot Peptide Identification panel
plot_benchmark_panel(ax1, subset_pepid, experiments_ordered, methods_ordered_pepid, method_palette, method_markers, pthr)
ax1.set_title("Peptide Identification", fontsize=12, fontweight="bold")
ax1.set_ylabel("MCC Score", fontsize=10)
ax1.set_ylim(-0.1, 1.05)
plt.setp(ax2.get_yticklabels(), visible=False)

# Draw MCC interpretation thresholds
for i, (thresh, label) in enumerate(mcc_thresholds.items()):
    ax1.axhline(
        thresh, color=mcc_colors[label], alpha=1,
        linestyle="dotted", linewidth=1.5, 
        label=label, zorder=0
    )
    ax1.text(
        0.01,
        thresh,
        label,
        color=mcc_colors[label],
        ha="left",
        va="center",
        fontsize=8,
        fontweight="bold",
        transform=ax1.get_yaxis_transform(),
        bbox=dict(boxstyle='round,pad=0.3', facecolor='white', edgecolor=mcc_colors[label], alpha=0.8),
        zorder=0
    )

# Plot Peptide Grouping panel
plot_benchmark_panel(ax2, subset_pepgrp, experiments_ordered, methods_ordered_pepgrp, method_palette, method_markers, pthr)
ax2.set_title("Peptide Grouping", fontsize=12, fontweight="bold")

# Set ylims (-0.15, to 0.75)
ax1.set_ylim(-0.15, 0.75)
ax2.set_ylim(-0.15, 0.75)

# Add threshold lines (without text) to the second plot
for thresh, label in mcc_thresholds.items():
    ax2.axhline(thresh, color=mcc_colors[label], alpha=0.9, linestyle=":", linewidth=1.2, zorder=0)

# --- Unified Legend ---
legend_elements = [
    Line2D([0], [0], marker=method_markers[m], color='w', markerfacecolor=method_palette[m],
           markeredgecolor='black', markersize=9, label=m)
    for m in methods_ordered_pepid
]
ax_legend.legend(
    handles=legend_elements, loc='center', ncol=len(methods_ordered_pepid), 
    fontsize=10, title="Method", title_fontsize=11, frameon=False
)

# --- Final Touches ---
plt.tight_layout(rect=[0, 0.1, 1, 1])
plt.subplots_adjust(bottom=0.2)
plots.finalize_plot(
    fig, show=True, save=save_to_folder,
    filename='benchmark_compact_fig2ab',
    filepath=figure_path,
    formats=figure_formats, 
    transparent=transparent_bg,
    dpi=figure_dpi,
)

Key Observations:

  • All methods achieve Fair-to-Moderate MCC (~0.3–0.5) for identification tasks
  • Performance improves as more peptides are perturbed (1 → 2 → Random → 50%)
  • Peptide Grouping shows lower overall MCC, reflecting the increased difficulty of correctly clustering perturbed peptides
  • ProteoForge demonstrates competitive performance, particularly in the Random and 50% scenarios

2.2 Simulation 1: Impact of Data Imputation

Objective: Assess whether imputing missing values affects method performance compared to complete (fully quantified) datasets.

Design: Random perturbation scenario (2–50% peptides) with matched complete vs. imputed versions.

sim1_pepid = pd.read_feather('./data/Sim1/4_Sim1_PeptideIdentification_PerformanceData.feather')
sim1_pepgrp = pd.read_feather('./data/Sim1/4_Sim1_Grouping_PerformanceData.feather')
# Combine as sim1 with Benchmark column indicating pepID, or pepGrp
sim1_pepgrp['Benchmark'] = 'Peptide Grouping'
sim1_pepid['Benchmark'] = 'Peptide Identification'
sim1_combined = pd.concat([sim1_pepid, sim1_pepgrp], ignore_index=True)
sim1_combined['Experiment'].value_counts()
# Subset to show: '2>50% Peptides'
sim1_combined = sim1_combined[sim1_combined['Experiment'] == '2>50% Peptides']
# sim1_combined

Visualization Option 1: Horizontal Dumbbell Plot

Shows performance change between complete (●) and imputed (■) data with connecting lines indicating magnitude and direction of change.

fig, axs = plt.subplots(2, 1, figsize=(6, 4), sharex=True)
fig.suptitle('Imputation Impact on Performance', fontsize=12, fontweight='bold')

benchmarks = ['Peptide Identification', 'Peptide Grouping']
methods_order = {
    'Peptide Identification': ['COPF', 'PeCorA', 'ProteoForge'],
    'Peptide Grouping': ['COPF', 'ProteoForge']
}

for idx, benchmark in enumerate(benchmarks):
    ax = axs[idx]
    df_bench = sim1_combined[sim1_combined['Benchmark'] == benchmark]
    
    # Calculate mean MCC for complete and imputed data for each method
    means = df_bench.groupby(['Method', 'DataType'])['MCC'].mean().unstack()
    methods = methods_order[benchmark]
    means = means.reindex(methods)

    y_pos = np.arange(len(methods))
    
    # Draw compact dumbbell elements
    for i, method in enumerate(methods):
        mcc_complete = means.loc[method, 'complete']
        mcc_imputed = means.loc[method, 'imputed']
        color = method_palette.get(method, 'grey')
        
        # Thinner connecting line
        ax.plot([mcc_complete, mcc_imputed], [i, i], color=color, alpha=0.8, 
                linewidth=2, solid_capstyle='round', zorder=1)
        
        # Smaller markers
        ax.scatter(mcc_complete, i, color='black', marker='o', s=40, zorder=2, alpha=0.9)
        ax.scatter(mcc_imputed, i, color=color, marker='s', s=35, zorder=2)
        
        # Add difference annotation
        diff = mcc_imputed - mcc_complete
        mid_x = (mcc_complete + mcc_imputed) / 2
        ax.text(mid_x, i + 0.15, f'{diff:+.3f}', ha='center', va='bottom', 
                fontsize=8, color=color, fontweight='bold')

    # Compact aesthetics
    ax.set_title(benchmark, fontsize=10, fontweight='bold', pad=8)
    ax.grid(axis='x', linestyle=':', alpha=0.5)
    ax.set_yticks(y_pos)
    ax.set_yticklabels(methods, fontsize=9)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.tick_params(axis='y', length=0)
    ax.tick_params(axis='x', labelsize=8)

axs[-1].set_xlabel('MCC Score', fontsize=9)

# Compact legend
legend_elements = [
    Line2D([0], [0], color='gray', marker='o', linestyle='None', markersize=6, 
           markerfacecolor='black', label='Complete'),
    Line2D([0], [0], color='gray', marker='s', linestyle='None', markersize=6, 
           label='Imputed'),
]

fig.legend(handles=legend_elements, loc='lower center', ncol=2, 
           bbox_to_anchor=(0.5, -0.02), fontsize=8, frameon=False)

plt.tight_layout(rect=[0, 0.05, 1, 0.95])
plt.show()

Assessment: Clear visualization of magnitude changes; distinct but uses significant whitespace.

Visualization Option 2: Slope Graph

Connects performance from "Complete" to "Imputed" conditions—effective for showing direction when changes are large.

fig, axs = plt.subplots(1, 2, figsize=(8, 3), sharey=True)
fig.suptitle('Performance: Complete vs Imputed Data', fontsize=12, fontweight='bold')

benchmarks = ['Peptide Identification', 'Peptide Grouping']
methods_order = {'Peptide Identification': ['COPF', 'PeCorA', 'ProteoForge'],
                 'Peptide Grouping': ['COPF', 'ProteoForge']}

for idx, benchmark in enumerate(benchmarks):
    ax = axs[idx]
    df_bench = sim1_combined[sim1_combined['Benchmark'] == benchmark]
    
    means = df_bench.groupby(['Method', 'DataType'])['MCC'].mean().unstack()
    methods = methods_order[benchmark]
    means = means.reindex(methods)
    
    # Create slope lines
    x_positions = [0, 1]  # Complete at 0, Imputed at 1
    
    for i, method in enumerate(methods):
        mcc_complete = means.loc[method, 'complete']
        mcc_imputed = means.loc[method, 'imputed']
        color = method_palette.get(method, 'grey')
        
        # Draw slope line
        ax.plot(x_positions, [mcc_complete, mcc_imputed], 
                color=color, linewidth=2, alpha=0.8, marker='o', markersize=6,
                label=method if idx == 0 else "")
        
        # Method labels at the midpoint
        mid_y = (mcc_complete + mcc_imputed) / 2
        ax.text(0.5, mid_y, method, ha='center', va='center', 
                fontsize=8, fontweight='bold', color=color,
                bbox=dict(boxstyle='round,pad=0.2', facecolor='white', alpha=0.8))

    ax.set_title(benchmark, fontsize=10, fontweight='bold')
    ax.set_xticks(x_positions)
    ax.set_xticklabels(['Complete', 'Imputed'], fontsize=9)
    ax.set_ylabel('MCC Score' if idx == 0 else '', fontsize=9)
    ax.grid(axis='y', linestyle=':', alpha=0.5)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.tick_params(labelsize=8)
    ax.set_ylim(0, 0.6)  # Set consistent y-axis range

plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()

Assessment: Small performance differences result in nearly horizontal lines—less effective for this dataset.

Visualization Option 3: Grouped Bar Chart

Traditional side-by-side comparison with hierarchical x-axis grouping methods by benchmark.

fig, ax = plt.subplots(figsize=(10, 4))
fig.suptitle('Impact of Data Imputation', fontsize=14, fontweight='bold')

# Prepare data for plotting
plot_data = []
benchmarks = ['Peptide Identification', 'Peptide Grouping']

for benchmark in benchmarks:
    df_bench = sim1_combined[sim1_combined['Benchmark'] == benchmark]
    means = df_bench.groupby(['Method', 'DataType'])['MCC'].mean().unstack()
    
    for method in means.index:
        complete_val = means.loc[method, 'complete']
        imputed_val = means.loc[method, 'imputed']
        diff = imputed_val - complete_val
        
        plot_data.append({
            'Benchmark': benchmark,
            'Method': method,
            'Complete': complete_val,
            'Imputed': imputed_val,
            'Difference': diff
        })

plot_df = pd.DataFrame(plot_data)

# Create hierarchical x positions
n_benchmarks = len(benchmarks)
n_methods_per_bench = {'Peptide Identification': 3, 'Peptide Grouping': 2}  # COPF, PeCorA, ProteoForge vs COPF, ProteoForge
method_order = {'Peptide Identification': ['COPF', 'PeCorA', 'ProteoForge'], 
                'Peptide Grouping': ['COPF', 'ProteoForge']}

x_positions = []
labels = []
benchmark_centers = []

x = 0
for i, benchmark in enumerate(benchmarks):
    methods = method_order[benchmark]
    start_x = x
    
    for j, method in enumerate(methods):
        method_data = plot_df[(plot_df['Benchmark'] == benchmark) & (plot_df['Method'] == method)]
        if not method_data.empty:
            row = method_data.iloc[0]
            
            # Plot bars
            width = 0.35
            bar1 = ax.bar(x - width/2, row['Complete'], width, color='lightgray', 
                         edgecolor='black', alpha=0.8, label='Complete' if i == 0 and j == 0 else "")
            bar2 = ax.bar(x + width/2, row['Imputed'], width,
                         color=method_palette.get(method, 'gray'), edgecolor='black', alpha=0.9,
                         label='Imputed' if i == 0 and j == 0 else "")
            
            # Add change indicators
            diff = row['Difference']
            arrow_color = 'green' if diff > 0 else 'red'
            arrow_symbol = '↑' if diff > 0 else '↓'
            
            ax.text(x, max(row['Complete'], row['Imputed']) + 0.05, 
                   f'{arrow_symbol} {abs(diff):.3f}', 
                   ha='center', va='bottom', fontsize=9, 
                   color=arrow_color, fontweight='bold')
            
            x_positions.append(x)
            labels.append(method)
        
        x += 1
    
    # Calculate benchmark center for labeling
    end_x = x - 1
    benchmark_centers.append((start_x + end_x) / 2)
    x += 0.5  # Space between benchmarks

# Set labels and ticks
ax.set_xticks(x_positions)
ax.set_xticklabels(labels, fontsize=10, rotation=45, ha='right')

# Add benchmark labels as secondary x-axis
ax2 = ax.twiny()
ax2.set_xlim(ax.get_xlim())
ax2.set_xticks(benchmark_centers)
ax2.set_xticklabels(benchmarks, fontsize=11, fontweight='bold')
ax2.tick_params(axis='x', length=0)

# Styling
ax.set_ylabel('MCC Score', fontsize=11)
ax.grid(axis='y', linestyle=':', alpha=0.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)
ax2.spines['bottom'].set_visible(False)
ax2.spines['left'].set_visible(False)

# Add vertical separator between benchmarks
separator_x = x_positions[2] + 0.75  # Between ProteoForge (PepID) and COPF (PepGrp)
ax.axvline(separator_x, color='gray', linestyle='--', alpha=0.5)

# Legend
handles, labels_leg = ax.get_legend_handles_labels()
ax.legend(handles, labels_leg, loc='upper left', fontsize=10, frameon=False)

plt.tight_layout()
plt.show()

Assessment: Intuitive layout but horizontally expansive—better suited for supplementary material.

Visualization Option 4: Vertical Overlapping Bars (Selected)

Compact vertical layout overlaying imputed (colored) on complete (gray) bars with annotated differences.

fig, axs = plt.subplots(2, 1, figsize=(4, 6), sharex=True)
fig.suptitle('MCC Performance: Complete vs Imputed Data (Random Perturbation (Sim1))', fontsize=12, fontweight='bold')

benchmarks = ['Peptide Identification', 'Peptide Grouping']
methods_order = {'Peptide Identification': ['COPF', 'PeCorA', 'ProteoForge'],
                 'Peptide Grouping': ['COPF', 'ProteoForge']}

for idx, benchmark in enumerate(benchmarks):
    ax = axs[idx]
    df_bench = sim1_combined[sim1_combined['Benchmark'] == benchmark]
    
    means = df_bench.groupby(['Method', 'DataType'])['MCC'].mean().unstack()
    methods = methods_order[benchmark]
    means = means.reindex(methods)
    
    y_pos = np.arange(len(methods))
    colors = [method_palette.get(method, 'gray') for method in methods]
    
    # Plot complete data (baseline) as light bars
    complete_bars = ax.barh(y_pos, means['complete'], height=0.6, 
                           color='lightgray', alpha=0.6, edgecolor='black',
                           label='Complete Data')
    
    # Plot imputed data as colored bars
    imputed_bars = ax.barh(y_pos, means['imputed'], height=0.4, 
                          color=colors, alpha=0.9, edgecolor='black',
                          label='Imputed Data')
    
    # Add MCC values and differences as text
    for i, method in enumerate(methods):
        complete_val = means.loc[method, 'complete']
        imputed_val = means.loc[method, 'imputed']
        diff = imputed_val - complete_val
        
        # Show complete MCC value
        ax.text(complete_val + 0.01, i + 0.15, f'{complete_val:.3f}', 
               va='center', ha='left', fontsize=9, color='black', fontweight='bold')
        
        # Show imputed MCC value  
        ax.text(imputed_val + 0.01, i - 0.15, f'{imputed_val:.3f}', 
               va='center', ha='left', fontsize=9, color=colors[i], fontweight='bold')
        
        # Show difference with arrow
        diff_color = 'green' if diff > 0 else 'red'
        arrow_symbol = '↑' if diff > 0 else '↓'
        
        # Position difference text at the right edge
        max_val = max(complete_val, imputed_val)
        ax.text(max_val + 0.08, i, f'{arrow_symbol}{abs(diff):.3f}', 
               va='center', ha='left', fontsize=8, 
               color=diff_color, fontweight='bold',
               bbox=dict(boxstyle='round,pad=0.2', facecolor='white', 
                        edgecolor=diff_color, alpha=0.8))
    
    # Styling
    # ax.set_title(benchmark, fontsize=11, fontweight='bold', pad=5)
    ax.set_yticks(y_pos)
    ax.set_yticklabels(methods, fontsize=10, fontweight='bold')
    if idx == 1:  # Only show x-axis label on bottom plot
        ax.set_xlabel('MCC Score', fontsize=10)
    ax.grid(axis='x', linestyle=':', alpha=0.4)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.tick_params(labelsize=12)
    
    # Set consistent x-axis limits for all subplots
    ax.set_xlim(0, 0.7)  # Fixed range to accommodate all data

# Unified legend at the bottom
handles, labels_leg = axs[0].get_legend_handles_labels()
fig.legend(handles, labels_leg, loc='lower center', ncol=2, 
           bbox_to_anchor=(0.5, -0.05), fontsize=10, frameon=False)

# Add explanation
fig.text(0.5, -0.12, 'Green ↑: Improvement with imputation | Red ↓: Decline with imputation', 
         ha='center', fontsize=9, style='italic')

plt.tight_layout(rect=[0, 0.12, 1, 0.95])
plt.show()
plots.finalize_plot(
    fig, show=True, save=save_to_folder,
    filename='benchmark_sim1_imputation_effect',
    filepath=figure_path,
    formats=figure_formats, 
    transparent=transparent_bg,
    dpi=figure_dpi,
)

Key Findings — Simulation 1:

  • In peptide identification the imputation has small impact for ProteoForge and COPF but PeCorA declines sharply
  • For peptide grouping, ProteoForge actually improves with imputation while COPF decline slightly.
    • This is due to with imputation, especially complete missingness does introduce potential new groups of peptides, proteoforge is capable of grouping them correctly whereas COPF fails to do so.
  • Selected visualization (Option 4) provides optimal information density for main figure

2.3 Simulation 2: Effect of Missingness Level

Objective: Quantify how increasing proportions of missing data affect detection performance.

Design: Factorial combinations of protein-level (0–80%) × peptide-level (0–80%) missingness.

Key Questions:

  • At what missingness threshold does performance critically degrade?
  • Do methods differ in their tolerance to incomplete data?
# Load and explore Sim2 data (Missingness analysis)
sim2_pepid = pd.read_feather('./data/Sim2/4_Sim2_PeptideIdentification_PerformanceData.feather')
sim2_pepgrp = pd.read_feather('./data/Sim2/4_Sim2_Grouping_PerformanceData.feather')

# Combine data
sim2_pepgrp['Benchmark'] = 'Peptide Grouping'
sim2_pepid['Benchmark'] = 'Peptide Identification'
sim2_combined = pd.concat([sim2_pepid, sim2_pepgrp], ignore_index=True)

# Clean up the missingness data
def clean_missingness_values(x):
    if isinstance(x, str):
        return float(x.rstrip('%')) / 100.0
    return float(x)

sim2_combined['ProteinMissingness'] = sim2_combined['ProteinMissingness'].apply(clean_missingness_values)
sim2_combined['PeptideMissingness'] = sim2_combined['PeptideMissingness'].apply(clean_missingness_values)

sim2_summary = sim2_combined.groupby(['Method', 'ProteinMissingness', 'PeptideMissingness', 'Benchmark'])['MCC'].mean().reset_index()

Visualization Option 1: Degradation Line Plots

MCC trajectories as peptide missingness increases (x-axis), stratified by protein missingness levels (line styles: solid=0%, dashed=40%, dotted=80%).

def create_degradation_lines(figsize=(6, 5)):
    """Show how performance degrades with increasing missingness - Vertical Layout"""
    
    fig, axs = plt.subplots(2, 1, figsize=figsize, sharex=True)
    fig.suptitle('Performance Degradation with Missingness', fontsize=11, fontweight='bold')
    
    benchmarks = ['Peptide Identification', 'Peptide Grouping']
    method_sets = {
        'Peptide Identification': ['COPF', 'PeCorA', 'ProteoForge'],
        'Peptide Grouping': ['COPF', 'ProteoForge']  # No PeCorA for grouping
    }
    
    for idx, benchmark in enumerate(benchmarks):
        ax = axs[idx]
        methods = method_sets[benchmark]
        
        # Plot lines for each method at different protein missingness levels
        protein_levels = [0.0, 0.4, 0.8]  # Show key levels only
        
        for method in methods:
            color = method_palette.get(method, 'gray')
            
            for i, prot_miss in enumerate(protein_levels):
                method_data = sim2_summary[(sim2_summary['Method'] == method) & 
                                          (sim2_summary['Benchmark'] == benchmark) &
                                          (sim2_summary['ProteinMissingness'] == prot_miss)]
                
                # Sort by peptide missingness
                method_data = method_data.sort_values('PeptideMissingness')
                
                linestyle = ['-', '--', ':'][i]  # Different styles for different protein missingness
                alpha = 1.0 if i == 0 else 0.7
                linewidth = 2.5 if method == 'ProteoForge' else 2
                
                ax.plot(method_data['PeptideMissingness'] * 100, method_data['MCC'], 
                       color=color, linestyle=linestyle, marker='o', markersize=3,
                       alpha=alpha, linewidth=linewidth)
        
        ax.set_title(benchmark, fontsize=10, fontweight='bold', pad=5)
        if idx == 1:  # Only bottom plot gets x-axis label
            ax.set_xlabel('Peptide Missingness (%)', fontsize=9)
        ax.set_ylabel('MCC Score', fontsize=9)
        ax.grid(True, linestyle=':', alpha=0.4)
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        ax.tick_params(labelsize=8)
        ax.set_ylim(0, 0.85)
    
    # Create custom legend
    legend_elements = []
    for method in ['COPF', 'PeCorA', 'ProteoForge']:
        color = method_palette.get(method, 'gray')
        legend_elements.append(Line2D([0], [0], color=color, lw=2, label=method))
    
    # Add line style legend
    legend_elements.extend([
        Line2D([0], [0], color='gray', linestyle='-', lw=1, label='Protein 0%'),
        Line2D([0], [0], color='gray', linestyle='--', lw=1, label='Protein 40%'),
        Line2D([0], [0], color='gray', linestyle=':', lw=1, label='Protein 80%')
    ])
    
    fig.legend(handles=legend_elements, loc='lower center', ncol=3, 
              bbox_to_anchor=(0.5, -0.05), fontsize=8, frameon=False)
    
    plt.tight_layout(rect=[0, 0.08, 1, 0.95])
    return fig

create_degradation_lines()
plt.show()

Assessment: Shows interaction between protein and peptide missingness, easy to spot the most obvious point, however not clean and details such as the protein level missingness is hard to read.

Visualization Option 2: Compact Line Plot with Change Annotations (Selected)

Shows MCC at diagonal missingness levels (Complete → Low → Moderate → High → Extreme) with Δ annotations, calculated from Complete - Others.

fig, axs = plt.subplots(2, 1, figsize=(5, 5), sharex=True)
fig.suptitle('Performance Change Across Missingness Levels', fontsize=12, fontweight='bold')

benchmarks = ['Peptide Identification', 'Peptide Grouping']
method_sets = {
    'Peptide Identification': ['COPF', 'PeCorA', 'ProteoForge'],
    'Peptide Grouping': ['COPF', 'ProteoForge']
}

change_levels = [
    ('Complete\n(0%,0%)', 0.0, 0.0),
    ('Low\n(20%,20%)', 0.2, 0.2),
    ('Moderate\n(40%,40%)', 0.4, 0.4),
    ('High\n(60%,60%)', 0.6, 0.6),
    ('Extreme\n(80%,80%)', 0.8, 0.8)
]
change_labels = [x[0] for x in change_levels]

for idx, benchmark in enumerate(benchmarks):
    ax = axs[idx]
    methods = method_sets[benchmark]
    for method in methods:
        color = method_palette.get(method, 'gray')
        marker = method_markers.get(method, 'o')
        mcc_vals = []
        for _, prot_miss, pep_miss in change_levels:
            mcc_val = sim2_summary[(sim2_summary['Method'] == method) &
                                   (sim2_summary['Benchmark'] == benchmark) &
                                   (sim2_summary['ProteinMissingness'] == prot_miss) &
                                   (sim2_summary['PeptideMissingness'] == pep_miss)]['MCC'].values
            mcc_vals.append(mcc_val[0] if len(mcc_val) > 0 else np.nan)
        ax.plot(change_labels, mcc_vals, color=color, marker=marker, markersize=8, linewidth=2.5,
                label=method, alpha=0.9, markeredgecolor='black', markeredgewidth=1.5)
        # Annotate difference from Complete
        for i in range(1, len(mcc_vals)):
            if not np.isnan(mcc_vals[0]) and not np.isnan(mcc_vals[i]):
                diff = mcc_vals[i] - mcc_vals[0]
                arrow = '↑' if diff > 0 else '↓'
                color_diff = 'green' if diff > 0 else 'red'
                ax.text(change_labels[i], mcc_vals[i] + 0.03, f'{arrow}{abs(diff):.2f}',
                        ha='center', va='bottom', fontsize=8, color=color_diff, fontweight='bold')
        # Annotate the complete MCC value
        if not np.isnan(mcc_vals[0]):
            ax.text(change_labels[0], mcc_vals[0] - 0.04, f'{mcc_vals[0]:.2f}',
                    ha='center', va='top', fontsize=8, color=color, fontweight='bold',
                    bbox=dict(boxstyle='round,pad=0.2', facecolor='white', edgecolor=color, alpha=0.7))
    ax.set_title(benchmark, fontsize=10, fontweight='bold', pad=5)
    ax.set_ylabel('MCC Score', fontsize=9)
    ax.grid(axis='y', linestyle=':', alpha=0.4)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.tick_params(labelsize=8)
    ax.set_ylim(-.15, 0.75)

axs[-1].set_xlabel('Missingness Level (Protein%, Peptide%)', fontsize=9)
axs[-1].set_xticks(change_labels)
axs[-1].set_xticklabels(change_labels, fontsize=9)

# Unified legend
handles, labels = axs[0].get_legend_handles_labels()
fig.legend(handles, labels, loc='lower center', ncol=3, bbox_to_anchor=(0.5, -0.08), fontsize=9, frameon=False)

plt.tight_layout(rect=[0, 0.08, 1, 0.95])
plt.show()
plots.finalize_plot(
    fig, show=True, save=save_to_folder,
    filename='benchmark_sim2_missingness_effect',
    filepath=figure_path,
    formats=figure_formats, 
    transparent=transparent_bg,
    dpi=figure_dpi,
)

Assessment: Clearly shows degradation trend; annotations highlight performance loss magnitude.

Visualization Option 3: Performance Ratio Bars

Shows MCC retention relative to complete data baseline (ratio = 1.0 means no performance loss).

def create_performance_ratios(figsize=(5, 5)):
    """Show performance relative to complete data (0%,0%) as compact bars - Vertical Layout"""
    
    fig, axs = plt.subplots(2, 1, figsize=figsize, sharex=True)
    fig.suptitle('Performance Relative to Complete Data', fontsize=11, fontweight='bold')
    
    benchmarks = ['Peptide Identification', 'Peptide Grouping']
    method_sets = {
        'Peptide Identification': ['COPF', 'PeCorA', 'ProteoForge'],
        'Peptide Grouping': ['COPF', 'ProteoForge']  # No PeCorA for grouping
    }
    
    # Key impaired scenarios relative to complete
    scenarios = [
        (0.2, 0.2, 'Low\n(20%,20%)'),
        (0.4, 0.4, 'Moderate\n(40%,40%)'),
        (0.6, 0.6, 'High\n(60%,60%)'),
        (0.8, 0.8, 'Severe\n(80%,80%)')
    ]
    
    for idx, benchmark in enumerate(benchmarks):
        ax = axs[idx]
        methods = method_sets[benchmark]
        
        # Get baseline performance (complete data)
        baseline_mcc = {}
        for method in methods:
            baseline_data = sim2_summary[(sim2_summary['Method'] == method) & 
                                        (sim2_summary['Benchmark'] == benchmark) &
                                        (sim2_summary['ProteinMissingness'] == 0.0) &
                                        (sim2_summary['PeptideMissingness'] == 0.0)]
            if len(baseline_data) > 0:
                baseline_mcc[method] = baseline_data['MCC'].values[0]
        
        x_pos = np.arange(len(scenarios))
        width = 0.8 / len(methods)
        
        for i, method in enumerate(methods):
            if method not in baseline_mcc:
                continue
                
            color = method_palette.get(method, 'gray')
            ratios = []
            
            for prot_miss, pep_miss, label in scenarios:
                impaired_data = sim2_summary[(sim2_summary['Method'] == method) & 
                                           (sim2_summary['Benchmark'] == benchmark) &
                                           (sim2_summary['ProteinMissingness'] == prot_miss) &
                                           (sim2_summary['PeptideMissingness'] == pep_miss)]
                
                if len(impaired_data) > 0:
                    impaired_val = impaired_data['MCC'].values[0]
                    # Calculate ratio (performance retention)
                    ratio = impaired_val / baseline_mcc[method] if baseline_mcc[method] > 0 else 0
                    ratios.append(ratio)
                else:
                    ratios.append(0)
            
            # Plot bars
            x_offset = x_pos + (i - len(methods)/2 + 0.5) * width
            bars = ax.bar(x_offset, ratios, width, color=color, alpha=0.8, 
                         edgecolor='black', linewidth=0.5,
                         label=method if idx == 0 else "")
            
            # Add ratio text on bars
            for bar, ratio in zip(bars, ratios):
                if ratio > 0:
                    height = bar.get_height()
                    ax.text(bar.get_x() + bar.get_width()/2., height + 0.02,
                           f'{ratio:.2f}', ha='center', va='bottom', fontsize=7, fontweight='bold')
        
        # Add reference line at 1.0 (no performance loss)
        ax.axhline(y=1.0, color='red', linestyle='--', alpha=0.7, linewidth=1)
        
        ax.set_title(benchmark, fontsize=10, fontweight='bold', pad=5)
        if idx == 1:  # Only bottom plot gets x-axis labels
            ax.set_xticks(x_pos)
            ax.set_xticklabels([s[2] for s in scenarios], fontsize=8)
        else:
            ax.set_xticks(x_pos)
            ax.set_xticklabels([])
        ax.set_ylabel('Performance Ratio', fontsize=9)
        ax.grid(axis='y', linestyle=':', alpha=0.4)
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        ax.tick_params(labelsize=8)
        ax.set_ylim(0, 1.3)
    
    # Legend and annotations
    handles, labels = axs[0].get_legend_handles_labels()
    fig.legend(handles, labels, loc='lower center', ncol=3, 
              bbox_to_anchor=(0.5, -0.05), fontsize=9, frameon=False)
    
    fig.text(0.02, 0.95, '1.0 = No performance loss', fontsize=7, 
             color='red', style='italic', transform=fig.transFigure)
    
    plt.tight_layout(rect=[0, 0.08, 1, 0.92])
    return fig

create_performance_ratios()
plt.show()

Assessment: Clearly shows degradation trend, however the Ratio gives advantage to COPF where it looks it doesn't get effected by missingness even in extreme levels. While this is true at face value, it is misleading as COPF's absolute performance is already low, so the ratio exaggerates its stability.

Key Findings — Simulation 2:

  • Peptide Identification:
    • Performance degrades progressively with increasing missingness
    • Methods maintain reasonable accuracy (~70% retention) up to 40% missingness
    • ProteoForge shows minimal effect until 80% Severe missingness, but even then drops 0.80
    • Beyond 60% missingness, detection reliability decreases substantially
  • Peptide Grouping:
    • Peptide grouping improves slightly with low to moderate missingness for ProteoForge
    • COPF also shows it shows that it gets down by .85-75 at all levels, meaning the amount of missingness doesn't matter much.

2.4 Simulation 3: Detection Sensitivity vs. Perturbation Magnitude

Objective: Determine the minimum perturbation magnitude required for reliable detection.

Design: Perturbation magnitudes ranging from subtle (0.10–0.25 log2) to strong (1.75–2.00 log2) with complete data (no missing values).

Key Questions:

  • What is the practical detection threshold for each method?
  • How does sensitivity scale with perturbation size?
# Load and explore Sim3 data (Magnitude of perturbation levels)
sim3_pepid = pd.read_feather('./data/Sim3/4_Sim3_PeptideIdentification_PerformanceData.feather')
sim3_pepid['Benchmark'] = 'Peptide Identification'
sim3_pepgrp = pd.read_feather('./data/Sim3/4_Sim3_Grouping_PerformanceData.feather')
sim3_pepgrp['Benchmark'] = 'Peptide Grouping'
sim3_combined = pd.concat([sim3_pepid, sim3_pepgrp], ignore_index=True)
# sim3_combined

pert_ranges = sorted(sim3_combined['PerturbationRange'].unique())

# Process sim3 data for visualization
def process_sim3_data():
    """Process sim3 data and normalize perturbation ranges"""
    
    # Create a copy to work with
    sim3_processed = sim3_combined.copy()
    
    # Normalize perturbation range formatting and create magnitude column
    def extract_magnitude(pert_range):
        """Extract average magnitude from perturbation range string"""
        if isinstance(pert_range, str) and '-' in pert_range:
            try:
                start, end = pert_range.split('-')
                return (float(start) + float(end)) / 2
            except:
                return None
        return None
    
    sim3_processed['Magnitude'] = sim3_processed['PerturbationRange'].apply(extract_magnitude)
    
    # Normalize perturbation range names (handle 0.1 vs 0.10 inconsistency)
    def normalize_range(pert_range):
        if isinstance(pert_range, str) and '-' in pert_range:
            try:
                start, end = pert_range.split('-')
                start_f = float(start)
                end_f = float(end)
                return f"{start_f:.2f}-{end_f:.2f}"
            except:
                return pert_range
        return pert_range
    
    sim3_processed['PerturbationRange_norm'] = sim3_processed['PerturbationRange'].apply(normalize_range)
    
    # Get best MCC for each combination (since there are multiple thresholds)
    sim3_summary = sim3_processed.groupby(['Method', 'Benchmark', 'PerturbationRange_norm', 'Magnitude'])['MCC'].mean().reset_index()
    
    # Sort by magnitude for proper ordering
    sim3_summary = sim3_summary.sort_values('Magnitude')
    
    print("Processed sim3 summary:")
    print(f"Shape: {sim3_summary.shape}")
    print(f"Magnitude levels: {sorted(sim3_summary['Magnitude'].unique())}")
    print(f"Perturbation ranges: {sorted(sim3_summary['PerturbationRange_norm'].unique())}")
    
    return sim3_summary

sim3_summary = process_sim3_data()
sim3_summary.head()
Processed sim3 summary:
Shape: (40, 5)
Magnitude levels: [np.float64(0.175), np.float64(0.375), np.float64(0.625), np.float64(0.875), np.float64(1.125), np.float64(1.375), np.float64(1.625), np.float64(1.875)]
Perturbation ranges: ['0.10-0.25', '0.25-0.50', '0.50-0.75', '0.75-1.00', '1.00-1.25', '1.25-1.50', '1.50-1.75', '1.75-2.00']
Method Benchmark PerturbationRange_norm Magnitude MCC
0 COPF Peptide Grouping 0.10-0.25 0.1750 0.0014
8 COPF Peptide Identification 0.10-0.25 0.1750 -0.0050
24 ProteoForge Peptide Grouping 0.10-0.25 0.1750 0.4274
16 PeCorA Peptide Identification 0.10-0.25 0.1750 0.0296
32 ProteoForge Peptide Identification 0.10-0.25 0.1750 0.1925

Visualization Option 1: Detection Curves with Endpoint Annotations (Selected)

MCC vs. perturbation magnitude with method-specific trajectories and Δ(MCC) summaries.

def create_detection_curves_combined(figsize=(3.2, 6)):
    """Stylized detection curves with distribution-of-change summaries to the right."""
    fig, ax = plt.subplots(figsize=figsize)
    fig.suptitle('Perturbation Detection Sensitivity', fontsize=11, fontweight='bold', y=0.98)

    # Ensure sim3_combined has a Magnitude column (average of perturbation range endpoints)
    sim3_full = sim3_combined.copy()
    def _extract_mag(pert):
        if isinstance(pert, str) and ('-' in pert):
            try:
                a, b = pert.split('-')
                return (float(a) + float(b)) / 2.0
            except:
                return np.nan
        return np.nan
    if 'Magnitude' not in sim3_full.columns:
        sim3_full['Magnitude'] = sim3_full['PerturbationRange'].apply(_extract_mag)

    benchmarks = ['Peptide Identification', 'Peptide Grouping']
    method_sets = {
        'Peptide Identification': ['COPF', 'PeCorA', 'ProteoForge'],
        'Peptide Grouping': ['COPF', 'ProteoForge']
    }
    linestyle_map = {
        'Peptide Identification': '-',
        'Peptide Grouping': '--'
    }

    # Plot curves
    plotted_info = []  # collect info for summary plotting (method, benchmark, first_mag, last_mag)
    for benchmark in benchmarks:
        methods = method_sets[benchmark]
        for method in methods:
            method_data = sim3_summary[
                (sim3_summary['Method'] == method) & 
                (sim3_summary['Benchmark'] == benchmark)
            ].sort_values('Magnitude')
            if len(method_data) > 0:
                color = method_palette.get(method, 'black')
                marker = method_markers.get(method, 'o')
                ls = linestyle_map[benchmark]
                ax.plot(
                    method_data['Magnitude'], method_data['MCC'],
                    linestyle=ls, color=color, linewidth=2.5,
                    marker=marker, markersize=7, alpha=0.9,
                    markerfacecolor=color, markeredgecolor='black'
                )

                mag_vals = method_data['Magnitude'].values
                mcc_vals = method_data['MCC'].values

                # Directly label at the end of the curve
                label_text = "ID" if benchmark == "Peptide Identification" else "Group"
                ax.text(
                    mag_vals[-1] + 0.03, mcc_vals[-1],
                    f"{label_text}",
                    ha='left', va='center', fontsize=8, fontweight='bold',
                    color=color, bbox=dict(boxstyle='round,pad=0.2', facecolor='white', alpha=0.8)
                )

                plotted_info.append({
                    'method': method,
                    'benchmark': benchmark,
                    'first_mag': mag_vals[0],
                    'last_mag': mag_vals[-1],
                    'color': color
                })

    ax.set_xlabel('Mean Perturbation Magnitude (log2)', fontsize=9)
    ax.set_ylabel('MCC Score', fontsize=10)
    ax.set_xlim(0.1, 2.0)
    ax.set_ylim(-0.05, 0.85)
    ax.grid(True, alpha=0.3, linestyle='--')
    mag_ticks = [0.2, 0.6, 1.0, 1.4, 1.8]
    ax.set_xticks(mag_ticks)
    ax.set_xticklabels([str(x) for x in mag_ticks], fontsize=8)
    ax.tick_params(axis='y', labelsize=8)

    # --- Summary to the right: vertical lines with annotation spanning MCC change ---
    if len(plotted_info) > 0:
        max_mag = sim3_summary['Magnitude'].max()
        x_base = max_mag + 0.08
        # create small offsets so multiple methods don't overlap
        max_methods = max(len(method_sets[b]) for b in benchmarks)
        offsets = np.linspace(-0.06, 0.06, max_methods)

        for bench_idx, benchmark in enumerate(benchmarks):
            methods = method_sets[benchmark]
            for m_idx, method in enumerate(methods):
                info = next((it for it in plotted_info if it['method'] == method and it['benchmark'] == benchmark), None)
                if info is None:
                    continue

                first_mag = info['first_mag']
                last_mag = info['last_mag']
                color = info['color']

                # Get MCC at first and last magnitude
                mrow_first = sim3_summary[
                    (sim3_summary['Method'] == method) &
                    (sim3_summary['Benchmark'] == benchmark) &
                    (sim3_summary['Magnitude'] == first_mag)
                ]
                mrow_last = sim3_summary[
                    (sim3_summary['Method'] == method) &
                    (sim3_summary['Benchmark'] == benchmark) &
                    (sim3_summary['Magnitude'] == last_mag)
                ]
                if len(mrow_first) > 0 and len(mrow_last) > 0:
                    mcc_first = mrow_first['MCC'].median()
                    mcc_last = mrow_last['MCC'].median()
                    median_diff = mcc_last - mcc_first
                    norm_diff = median_diff / (abs(mcc_first) + 1e-6)

                    x_pos = x_base + offsets[m_idx]
                    # Draw vertical line spanning MCC range
                    ax.plot([x_pos, x_pos], [mcc_first, mcc_last], color=color, linewidth=2.5, alpha=0.8, linestyle=linestyle_map[benchmark])
                    # Annotate at midpoint
                    mid_y = (mcc_first + mcc_last) / 2
                    txt = (f"Δ={median_diff:.3f}")
                    ax.text(x_pos + 0.03, mid_y, txt, fontsize=7, va='center', ha='left',
                            color=color, bbox=dict(boxstyle='round,pad=0.2', facecolor='white', alpha=0.8))

        # expand x-limits to accommodate the summaries
        ax.set_xlim(ax.get_xlim()[0], x_base + 0.18)


    plt.tight_layout()
    return fig

fig = create_detection_curves_combined(figsize=(5, 6))
plt.show()
plots.finalize_plot(
    fig, show=True, save=save_to_folder,
    filename='benchmark_sim3_perturbation_detection',
    filepath=figure_path,
    formats=figure_formats, 
    transparent=transparent_bg,
    dpi=figure_dpi,
)

Assessment: Comprehensive view of sensitivity curves; endpoint Δ annotations aid interpretation.

Visualization Option 2: Detection Efficiency Bars

Compact comparison at categorical magnitude levels (Low → Mid-Low → Mid-High → High).

# SIM3 Compact Visualization Approach 3: Detection Efficiency Bar Chart (VERTICAL)
def create_detection_efficiency_bars(figsize=(5, 4)):
    """Show detection efficiency at key perturbation levels"""
    
    fig, axs = plt.subplots(2, 1, figsize=figsize, sharex=True)
    fig.suptitle('Detection Efficiency at Key Perturbation Levels', fontsize=11, fontweight='bold')
    
    # Select key perturbation levels for comparison
    key_magnitudes = [0.175, 0.625, 1.125, 1.875]  # Low, Mid-low, Mid-high, High
    mag_labels = ['Low\n(~0.2)', 'Mid-Low\n(~0.6)', 'Mid-High\n(~1.1)', 'High\n(~1.9)']
    
    benchmarks = ['Peptide Identification', 'Peptide Grouping']
    method_sets = {
        'Peptide Identification': ['COPF', 'PeCorA', 'ProteoForge'],
        'Peptide Grouping': ['COPF', 'ProteoForge']
    }
    
    for i, benchmark in enumerate(benchmarks):
        ax = axs[i]
        methods = method_sets[benchmark]
        
        # Prepare data for bar chart
        x_positions = np.arange(len(key_magnitudes))
        width = 0.8 / len(methods)
        
        for m_idx, method in enumerate(methods):
            method_scores = []
            
            for magnitude in key_magnitudes:
                method_data = sim3_summary[
                    (sim3_summary['Method'] == method) & 
                    (sim3_summary['Benchmark'] == benchmark) &
                    (sim3_summary['Magnitude'] == magnitude)
                ]
                
                if len(method_data) > 0:
                    method_scores.append(method_data['MCC'].iloc[0])
                else:
                    method_scores.append(0)
            
            # Plot bars
            color = method_palette.get(method, 'black')
            x_pos = x_positions + (m_idx - len(methods)/2 + 0.5) * width
            
            bars = ax.bar(x_pos, method_scores, width, 
                         color=color, alpha=0.8, label=method,
                         edgecolor='white', linewidth=0.5)
            
            # Add value labels on bars
            for bar, score in zip(bars, method_scores):
                if score > 0.05:  # Only show if meaningful
                    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
                           f'{score:.2f}', ha='center', va='bottom', 
                           fontsize=6, fontweight='bold', color=color)
        
        # Formatting
        ax.set_title(benchmark, fontsize=10, fontweight='bold', loc='left')
        ax.set_ylabel('MCC Score', fontsize=9)
        ax.set_ylim(0, 1.1)
        ax.set_xticks(x_positions)
        ax.grid(True, alpha=0.3, axis='y', linestyle='--')
        
        if i == 0:
            ax.legend(bbox_to_anchor=(1.02, 1), loc='upper left', fontsize=8)
        
        if i == 1:  # Bottom plot
            ax.set_xlabel('Perturbation Magnitude Level', fontsize=9)
            ax.set_xticklabels(mag_labels, fontsize=8)
        else:
            ax.set_xticklabels([])
    
    plt.tight_layout()
    return fig

create_detection_efficiency_bars()
plt.show()

Key Findings — Simulation 3:

  • MCC increases monotonically with perturbation magnitude across all methods
  • Practical detection threshold: ~0.5 log2 (MCC improves sharply above this level)
  • ProteoForge shows strong sensitivity for the Peptide Identification benchmark
  • Methods converge at high magnitudes—differences are most pronounced for subtle perturbations

2.5 Simulation 4: Complex Multi-Variable Experimental Designs

Objective: Evaluate method robustness across experimental designs with multiple interacting factors.

Design Factors:

  • Number of conditions: 2, 4, 6, or 8 experimental groups
  • Overlap pattern: Same peptides perturbed across conditions (True) vs. different peptides (False)
  • Direction: All perturbations in same direction vs. random up/down changes

Key Questions:

  • Does adding more experimental conditions improve detection?
  • Which perturbation patterns are easier to detect?
# Load and explore Sim3 data (Magnitude of perturbation levels)
sim4_pepid = pd.read_feather('./data/Sim4/4_Sim4_PeptideIdentification_PerformanceData.feather')
sim4_pepid['Benchmark'] = 'Peptide Identification'
sim4_pepgrp = pd.read_feather('./data/Sim4/4_Sim4_Grouping_PerformanceData.feather')
sim4_pepgrp['Benchmark'] = 'Peptide Grouping'
sim4_combined = pd.concat([sim4_pepid, sim4_pepgrp], ignore_index=True)

sim4_summary = sim4_combined.groupby([
        'Method', 'Benchmark', 'Overlap', 'Direction', 'N_Conditions'
    ])['MCC'].mean().unstack().reset_index()
# sim4_summary

Robustness Across Experimental Setups

Line plots showing MCC vs. number of conditions, with line styles encoding overlap (solid=True, dashed=False) and markers encoding direction (●=random, ■=same).

from matplotlib.lines import Line2D

fig, axes = plt.subplots(2, 1, figsize=(5.0, 5.0), sharex=True)
fig.suptitle('Method Robustness Across Experimental Setups', fontsize=10, fontweight='bold', y=0.98)

# Aggregate
agg = sim4_combined.groupby(['Benchmark','Method','Overlap','Direction','N_Conditions'])['MCC'].mean().reset_index()

# Order methods by overall mean performance so best are drawn last (on top)
method_order = agg.groupby('Method')['MCC'].mean().sort_values(ascending=True).index.tolist()

benchmarks = ['Peptide Identification', 'Peptide Grouping']
direction_marker = {'random': 'o', 'same': 's'}
overlap_ls = {True: '-', False: '--'}

# Plot each benchmark in its own small panel
for i, benchmark in enumerate(benchmarks):
    ax = axes[i]
    sub = agg[agg['Benchmark'] == benchmark].copy()
    # draw in method order so higher performers are on top
    for method in method_order:
        sub_m = sub[sub['Method'] == method]
        if sub_m.empty:
            continue
        # within method, draw combinations of overlap/direction
        combos = sorted(sub_m.groupby(['Overlap','Direction']), key=lambda x: (x[0][0], x[0][1]))
        for (overlap, direction), grp in combos:
            grp = grp.sort_values('N_Conditions')
            color = method_palette.get(method, '#333333')
            ls = overlap_ls.get(overlap, '-')
            mk = direction_marker.get(direction, 'o')
            # Plot line with clearly visible marker edge for small sizes
            ax.plot(
                grp['N_Conditions'], grp['MCC'],
                color=color, linestyle=ls, marker=mk,
                markersize=5, markeredgecolor='black', markeredgewidth=0.7,
                linewidth=1.6, alpha=0.95, zorder=3
            )
            # small endpoint label (method short) to aid reading in compact panel
            last_x = grp['N_Conditions'].iloc[-1]
            last_y = grp['MCC'].iloc[-1]
            ax.text(
                last_x + 0.12, last_y,
                method if (overlap, direction) == (True, 'random') else "",
                fontsize=10, fontweight='bold', color=color, va='center'
            )

    # Styling for compact embedding
    ax.set_title(benchmark, fontsize=9, fontweight='bold', loc='left')
    ax.set_ylabel('MCC', fontsize=8)
    ax.set_ylim(-0.05, 0.88)
    ax.grid(axis='y', linestyle=':', linewidth=0.6, alpha=0.5)
    ax.tick_params(axis='both', which='major', labelsize=7)
    sns.despine(ax=ax)

    # # Panel label (A / B) for later placement in multi-panel figures
    # panel_label = 'A' if i == 0 else 'B'
    # ax.text(0.01, 0.98, panel_label, transform=ax.transAxes,
    #         fontsize=9, fontweight='bold', va='top', ha='left',
    #         bbox=dict(boxstyle='round,pad=0.15', facecolor='white', edgecolor='none', alpha=0.8))

# X-axis formatting (shared)
unique_conds = sorted(agg['N_Conditions'].unique())
axes[-1].set_xticks(unique_conds)
axes[-1].set_xlabel('Number of Conditions', fontsize=8)
axes[-1].tick_params(axis='x', labelsize=7)

# Build compact legends (method color, overlap linestyle, direction marker)
methods_present = [m for m in method_order if m in agg['Method'].unique()]
method_handles = [
    Line2D([0], [0], color=method_palette.get(m, 'gray'), lw=2.5, label=m)
    for m in methods_present
]
overlap_handles = [
    Line2D([0], [0], color='gray', lw=1.8, linestyle='-', label='Overlap: True'),
    Line2D([0], [0], color='gray', lw=1.8, linestyle='--', label='Overlap: False'),
]
direction_handles = [
    Line2D([0], [0], color='black', marker='o', linestyle='None', markersize=5, label='Direction: random'),
    Line2D([0], [0], color='black', marker='s', linestyle='None', markersize=5, label='Direction: same'),
]

# # Place method legend in top-left of first panel and style legend for patterns in second
# leg1 = axes[0].legend(handles=method_handles, title='Method', loc='upper left',
#                       fontsize=7, title_fontsize=8, frameon=False, ncol=1)
# axes[0].add_artist(leg1)
axes[1].legend(handles=overlap_handles + direction_handles, loc='upper left',
               fontsize=7, frameon=False, ncol=1)

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()
plots.finalize_plot(
    fig, show=True, save=save_to_folder,
    filename='benchmark_sim4_experimental_setups',
    filepath=figure_path,
    formats=figure_formats, 
    transparent=transparent_bg,
    dpi=figure_dpi,
)

Key Findings — Simulation 4:

  • Higher number of conditions help with peptide identification for linear model based methods like PeCorA and ProteoForge
  • When it comes to peptide identification the pattern accross different setups for PeCorA and ProteoForge is extremely similar just with different MCC levels due to the data being missing/imputed version.
  • Allowing overlap seems to improve the performance, likely resulting in less predicted groups of peptides, making it easier to identify them correctly.
  • Direction of the perturbations (same vs. random) has less effect than other factors
  • ProteoForge consistently outperforms other methods across design configurations

Summary and Figure Assembly

Generated Figure Panels

Panel Source Description File
2A-B SWATH-MS MCC barplots: Identification & Grouping benchmark_compact_fig2ab.pdf
2C Sim1 Imputation impact (vertical bars) benchmark_sim1_imputation_effect.pdf
2D Sim2 Missingness degradation (line plot) benchmark_sim2_missingness_effect.pdf
2E Sim3 Detection sensitivity curves benchmark_sim3_perturbation_detection.pdf
2F Sim4 Experimental design robustness benchmark_sim4_experimental_setups.pdf

Summary of Benchmark Findings

  1. Real-world performance (SWATH-MS): ProteoForge achieves competitive MCC across perturbation scenarios, with strongest relative performance in identification tasks.

  2. Imputation robustness (Sim1): ProteoForge has minimal performance impact from imputated and robust to appropriately handled missing values.

  3. Missingness tolerance (Sim2): ~40% missingness threshold before substantial degradation; all methods show similar sensitivity patterns.

  4. Detection sensitivity (Sim3): ~0.5 log2 perturbation magnitude marks the practical detection threshold; larger effects are consistently detected.

  5. Design complexity (Sim4): More conditions improve detection; overlapping perturbation patterns offer detection advantages.

Note: Figures exported as PDF vectors for final assembly in Inkscape. Panel labels and minor styling adjustments to be applied during composition.

print("Notebook Execution Time:", utils.prettyTimer(utils.getTime() - startTime))
Notebook Execution Time: 00h:00m:03s